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Abstract 


In this project, we designed and experimented with various autoencoder architectures aimed 
toward converting grayscale images into their RGB counterparts. We utilized the 32x32 RGB 
images of the CIFAR10 dataset to test the models’ conversion of their grayscale versions into their 
original RGB formats. The focus of the experimentation was to create a versatile model capable 
of generating RGB images from newly seen grayscale images and to identify the components of an 
autoencoder architecture that produce the best results. We tested a 3-layer convolutional model, 
a linear latent model, and a CNN latent model, and determined the 3-layer convolutional model 
produced the superior reconstructed colorized images. 


1 Introduction 


Throughout history, people have created brilliantly-colored architecture, clothing, paintings, and 
more. Any medium in which human creativity could be expressed witnessed an explosion of color. 
While we see their architecture as stark white in modern times, we know the ancient Romans bathed 
their statues, temples, and homes in splashes of bright colored paints, for example. Art movements 
throughout the centuries have always brought with them new trends and discoveries in color theory 
and creativity. Photography, then, became the next medium for popular colorful expression. 

Color photography has been possible for over one hundred years, and has been widely found 
throughout the world for over fifty. Starting in the mid-1960s, the advent of widely-used color cam- 
era technology was brought to the business market, and in 1990, the first digital color camera was 
brought to the consumer market. Since then, camera technology has only improved to the point that 
the average person has a 12-48 megapixel camera in their pocket. But with such widely accessible 
high-quality color photography, the desire to colorize photos from before this time has become that 
much more popular. This desire is achievable with an autoencoder, and it is this desire that we sought 
to fulfill with this project. 


2 Method 


An autoencoder, or convolutional encoder-decoder model, is a type of layered convolutional neural 
network designed to compress and reconstruct an image. In doing so, the autoencoder seeks to predict 
the same image as an output as what it received as an input. As opposed to the typical one dataset 
input used with a convolutional neural network, we use two datasets when colorizing grayscale images. 
By presenting the encoder with colored images to learn the features from, and only then giving it 
grayscale images to reconstruct, the decoder will build back to the images using the color features 
learned initially. In this way, the autoencoder will break down a black-and-white image and build up 
a color image. 

As a team, we decided to each develop a slightly different architecture for the encoder and decoder 
and each test them independently using the same dataset. Once that was completed, we would compare 
and contrast the results from each architecture and determine which was best for reconstruction. 


3 Experiments 


3.1 Dataset 


The dataset we used for the autoencoder model was the CIFAR-10 dataset. This is a dataset 
composed of sixty-thousand 32x32 colored images separated into ten classes of five thousand training 
and one thousand test images each. It would make for a sufficient and easily-accessible dataset that 
isn’t so big that training would take longer than necessary. 


3.2 Evaluation metrics 


For each of the models that we trained and employed, we used the mean-squared error loss function 
to evaluate the disparity between the autoencoders’ output RGB images generated from their grayscale 
inputs and the original RGB images from which the grayscale images were derived. This method 
of evaluation allowed us to assess the autoencoders’ abilities to generate the correct proportions and 
intensities of the three color channels of the outputs. We also took a more visual approach to evaluating 
the performance of the models by performing a direct comparison between the original RGB images 
and the output RGB images as shown below. 


3.3 Results 


One of the implementations of the autoencoder was an implementation using three convolutional 
layers as well as a dense layer in each of the encoder and decoder in Keras. The convolutional layers 
contain 64, 128, and 256 nodes respectively. This is a fairly standard implementation, and it produced 
quite good results with minimal loss and maximal accuracy. 

Below are some images from the CIFAR-10 dataset next to the resultant colorized images produced 
by the autoencoder. 


Predicted Colorized Images 


Test Images (Color) 
= 


Figure 1: Images from the RGB test dataset, the grayscale test dataset, and the colorized results, 
respectively from the original model. 
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Figure 2: Output from the autoencoder architecture without a dynamically changing learning rate. 


Further implementations of the autoencoder model attempted to identify the potential difference 
in performance between a fully connected (Dense) latent representation and a convolutional latent 
representation within the autoencoder architecture. 

Two models were employed for such testing. One model had a fully-connected latent representation 
of a Linear layer consisting of 256 output nodes, while the other model had a fully convolutional latent 
representation consisting of 256 3x3 filters. For the fully connected latent representation, a second 
layer was required to upscale back to the number of neurons required for convolutional computation. 

Below are the outputs of two autoencoder models, implemented in PyTorch. 
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Figure 3: Test input and colorized output from the autoencoder architecture containing a fully- 
connected latent representation. 
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Figure 4: Test input and colorized output from the autoencoder architecture containing a CNN 
representation. 


3.4 Analysis and discussions 


Upon examining the results of all of the models, it became more apparent what components of an 
autoencoder’s architecture are most necessary to transform grayscale images into RGB format. The 
convolutional latent representation performed much better than the fully connected latent representa- 
tion and also produced clearer images than that of the original configuration (Figure 1), as well. The 
original configuration, consisting of both convolutional layers and dense layers (similar to model 2), 
did manage to capture more color than the other two models, however. 

The following analysis describes in detail each of the architectures that were employed throughout 
the experimentation as well as the change in the loss values during training. We can examine this 
information to determine each of the model’s efficiency in learning to take the input grayscale images 
and generate their RGB counterparts. 


3.4.1 Model Architecture and Training 
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Figure 5: First implementation of the autoencoder using three convolutional layers 
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Figure 6: Loss values of the first model over the 30 epochs for which it was trained. 
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Figure 7: Loss values of the first model over the 30 epochs for which it was trained with a static 
learning rate. 
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Figure 8: Architecture of the autoencoder model containing a Linear latent representation. 
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Figure 9: Loss values of the FC latent space model over the 20 epochs for which it was trained. 
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Figure 10: Architecture of the autoencoder model containing a convolutional latent representation. 
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Figure 11: Loss values of the CNN latent space model over the 20 epochs for which it was trained. 


As shown in Figures 5 and 6, the first architecture was capable of achieving a minimum loss on the 
training set of approximately 0.004, allowing it to generate the RGB images as shown in the results. 


3.4.2 Comparisons 


Figures 1-4 display the difference between having n fully connected and convolutional latent space 
representation within the autoencoder. The CNN architecture managed to achieve a minimum loss 
comparable to that of the first model at approximately 0.005, while the FC architecture struggled to 
even get below 0.01. The implications of this difference are demonstrated in the output, where the 
output of the model with the convolutional latent space is much more defined and clear than that of 
the model with the linear latent space. Additionally, the vibrancy of the images reconstructed using 
the model with the three convolutional layers is significantly more notable than their counterparts in 
the other architectures. 

It is also worth noting that the model containing the fully-connected latent space had significantly 
more parameters than the other models that were employed, causing it to take more time to train 
during experimentation. The model with the convolutional latent space, on the other hand, had 
even fewer parameters than the first model but still achieved a comparable loss, suggesting that 
further modification to the architecture could yield similar or better results on the test set with a fully 
convolutional model. 

Overall, it’s clear that the 3-convolutional layer architecture produced the best-looking results. 
While its loss was comparable to that of the CNN latent space model, both at the 20 epoch mark and 
what could be extrapolated to be the loss beyond that mark for the latter, the apparent reconstruction 
was leagues ahead in the former. The performance of the linear latent representation architecture, 
then, is insufficient in both categories. 


3.4.3 Discussion 


The experimentation with 3-layer autoencoders, fully connected (Dense) latent, and convolutional 
representation of models yielded significant insights into its performance and adaptability in grayscale 
image colorization. 

The observation that a static learning rate showed a slight degradation in the colorization quality 
underlines the importance of adaptive learning strategies. Dynamically adjusting the learning rate 
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when the loss becomes stationary for a certain number of epochs seems crucial for maintaining optimal 
convergence and enhancing the intricacies of colorized outputs, although makes a minimal difference 
when compared to other architectures. 

The comparison between fully connected and convolutional latent spaces highlighted the superiority 
of convolutional architectures for capturing intricate spatial information. The convolutional latent 
space exhibited a more effective representation of image features by increasing the image quality 
compared to the fully connected latent representation. 

Despite variations in architecture and latent space representation, the robustness and performance 
of the conventional 3-layer autoencoder stood out. Its popularity stems from its ability to strike a 
balance between model complexity and efficiency, offering a favorable trade-off for grayscale image 
colorization tasks. The adaptability and reliability of this architecture in consistently producing high- 
quality colorized outputs make it a preferred choice among the experimented configurations. 


4 Conclusion 


4.1 Description 


In this work, we investigate the viability of multiple different autoencoder architectures for the 
colorization of grayscale images. We have tested a 3-layer model with and without modifications to its 
learning rate, and two latent space models, both linear and CNN-composed. While the fixed learning 
rate slightly impacted colorization quality, the comparative analysis underscored the effectiveness of the 
3-layer autoencoder architecture. The superior performance of the convolutional latent space reaffirmed 
its significance in preserving spatial information crucial for accurate colorization. Future research 
focusing on hybrid architectures integrating dynamic learning rates with convolutional latent spaces 
holds promise for further enhancing colorization fidelity. These insights serve as valuable guidelines for 
optimizing autoencoder architectures for grayscale image colorization tasks, paving the way for more 
refined and realistic colorization techniques in the future. 


4.2 What I Learned 


Through the implementation of two separate models for the task of converting grayscale images 
to RGB images, I learned about the impact that various architectures and hyperparameters may 
have on the effectiveness of a neural network. The experimentation of the different architectures 
and hyperparameters was extensive, and employing efficient methods towards changing architectures 
between training is very important for such research. 

I also learned that convolutional networks are far more powerful for processing visual information 
compared to similar models utilizing linear layers. When comparing the number of parameters and 
accuracy between models employing CNNs and FC layers, the models with CNN layers are far more 
efficient for certain tasks and are more robust. 


5 Contribution 


5.1 Code 


Below is the code for the different architectures examined in this work. 


5.1.1 Three Convolutional Layer Implementation With Learning Rate Modification 


import numpy as np 

import matplotlib.pyplot as plt 

import os 

from keras.layers import Dense, Input 

from keras.layers import Conv2D, Flatten 

from keras.layers import Reshape, Conv2DTranspose 
from keras.models import Model 

from keras.callbacks import ReduceLROnPlateau 
from keras.callbacks import ModelCheckpoint 

from keras.datasets import cifari0O 


11 from keras import backend as K 


16 # Load in the images from the cifar10 dataset. 
17 (x_train, _), (x_test, _) = cifari0.load_data() 


18 

19 # Turn the images in the test dataset to grayscale for a new test dataset 
20 x_train_gray = np.dot(x_train[...,:3], [0.299, 0.587, 0.114]) 
21 x_test_gray = np.dot(x_test[...,:3], [0.299, 0.587, 0.114]) 
22 

23 # Get the size of the images 

24 img_rows = x_train.shape [1] 

25 img_cols = x_train. shape [2] 

26 channels = x_train.shape [3] 

27 

28 Fae pl at Saat teed oad ct Gl Se 2 

29 >PLOTTING THE DATASETS’ 

30 sh Sale Sa ag eee a Sas EN 4 

31 # Store 100 images of the color test dataset 


32 datasetC = x_test[:100] 

33 datasetC = datasetC.reshape((10, 10, img_rows, img_cols, channels)) 
34 datasetC = np.vstack([np.hstack(i) for i in datasetC]) 

35 

36 # Plot 100 images of the color test dataset 

37 plt.figure() 

38 plt.axis(’off’) 

39 plt.title(’Test Images (Color) ’) 

40 plt.imshow(datasetC, interpolation=’none’) 

41 plt.show() 

42 

43 # Store 100 images of the grayscale test dataset 

44 datasetG = x_test_gray[:100] 

45 datasetG = datasetG.reshape((10, 10, img_rows, img_cols)) 
46 datasetG = np.vstack([np.hstack(i) for i in datasetG]) 

47 

48 # Plot 100 images of the grayscale test dataset 

49 plt.figure() 


50 plt.axis(’off’) 
51 plt.title(’Test Images (Grayscale) ’) 


52 plt.imshow(datasetG, interpolation=’none’, cmap=’ gray’) 
53 plt.show() 

54 
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58 # Normalize and reshape the color datasets 
59 x_train = x_train.astype(’float32’) / 255 
60 x_test = x_test.astype(’float32’) / 255 


61 x_train = x_train.reshape(x_train.shape[0], img_rows, img_cols, channels) 
62 x_test = x_test.reshape(x_test.shape[0], img_rows, img_cols, channels) 
63 


64 # Normalize and reshape the grayscale datasets 

65 x_train_gray = x_train_gray.astype(’float32’) / 255 

66 x_test_gray = x_test_gray.astype(’float32’) / 255 

67 x_train_gray = x_train_gray.reshape(x_train_gray.shape[0], img_rows, img_cols, 1) 
68 x_test_gray = x_test_gray.reshape(x_test_gray.shape[0], img_rows, img_cols, 1) 


70 # Define parameters 


71 input_shape = (img_rows, img cols, 1) 
72 batch_size = 32 
73 kernel_size = 3 


74 latent_dim = 256 
75 layer_filters = [64, 128, 256] 


81 # Build the model 


inputs = Input(shape=input_shape, name=’encoder_input’) 


x = inputs 


# 3 Convolution layers: 64 filters, 128 filters, 
for filters in layer_filters: 

x = Conv2D(filters=filters, 
kernel_size=kernel_size, 
strides=2, 
activation=’relu’, 
padding=’ same’) (x) 


# Reshape to 4242256 for processing, to eventually be shaped back into 3223223 


shape = K.int_shape(x) 


# Thank you for resources online on how to do this type of thing. 


x = Flatten() (x) 


256 filters 


latent = Dense(latent_dim, name=’latent_vector’) (x) 
encoder = Model(inputs, latent, name=’encoder’) 
encoder.summary () 


latent_inputs = Input(shape=(latent_dim,), name=’decoder_input’) 


x = Dense(shape[1]*shape[2]*shape[3]) (latent_inputs) 


x = Reshape((shape[1i], shape[2], shape[3])) (x) 


# Three convolution layers: 256 filters, 128 filters, 


for filters in layer_filters[::-1]: 

x = Conv2DTranspose(filters=filters, 
kernel_size=kernel_size, 
strides=2, 
activation=’relu’, 
padding=’ same’) (x) 


outputs = Conv2DTranspose(filters=channels, 


kernel_size=kernel_size, 


activation=’sigmoid’, 
padding=’same’, 


name=’decoder_output ’) (x) 


decoder = Model(latent_inputs, outputs, name=’decoder’) 


decoder.summary () 


autoencoder = Model(inputs, decoder(encoder(inputs)), 


autoencoder. summary () 


# Save as a new model any time the loss improves. 


save_dir = os.path.join(os.getcwd(), ’saved_models’) 


model_name = ’Modelfepoch:02d}.h5’ 
if not os.path.isdir(save_dir): 
os.makedirs (save_dir) 
filepath = os.path.join(save_dir, model_name) 


# Reduce learning rate by sqrt(0.1) if the loss 


does not improve in 5 epochs 


lr_reducer = ReduceLROnPlateau(factor=np.sqrt(0.1), 


cooldown=0, 
patience=5, 
verbose=1, 
min_lr=0.5e-6) 


# Save weights 

checkpoint = ModelCheckpoint (filepath=filepath, 
monitor=’val_loss’, 
verbose=1, 


save_best_only=True) 
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name=’autoencoder’) 


153 # Mean Square Error (MSE) loss function, Adam optimizer 
154 autoencoder.compile(loss=’mse’, optimizer=’ adam’ ) 

155 

156 # Called every epoch 

157 callbacks = [lr_reducer, checkpoint] 

158 

159 # Train the autoencoder 

160 autoencoder.fit(x_train_gray, 


161 x_train, 

162 validation_data=(x_test_gray, x_test), 
163 epochs=30, 

164 batch_size=batch_size, 

165 callbacks=callbacks) 

166 


167 # Predict the autoencoder output from test data 

168 x_decoded = autoencoder.predict (x_test_gray) 

169 

170 # Display the 1st 100 colorized images 

171 datasetGtoC = x_decoded[:100] 

172 datasetGtoC = datasetGtoC.reshape((10, 10, img_rows, img_cols, channels)) 
173 datasetGtoC = np.vstack([np.hstack(i) for i in datasetGtoC]) 
174 plt.figure() 

175 plt.axis(’off’) 

176 plt.title(’Predicted Colorized Images’) 

177 plt.imshow(datasetGtoC, interpolation=’none’) 

178 plt.show() 


5.1.2 Three Convolutional Layer Implementation Without Learning Rate Modification 


???Colorization autoencoder 


Used to train gray scale images of CIFAR-10 dataset to make them colorized 
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from __future__ import absolute_import 
from __future__ import division 
from __future__ import print_function 


from tensorflow.keras.layers import Dense, Input 

from tensorflow.keras.layers import Conv2D, Flatten 

from tensorflow.keras.layers import Reshape, Conv2DTranspose 
from tensorflow.keras.models import Model 

from tensorflow.python.keras.callbacks import ReduceLROnPlateau 
from tensorflow.python.keras.callbacks import ModelCheckpoint 
from tensorflow.keras.datasets import cifari10 

from tensorflow.keras.utils import plot_model 

from tensorflow.keras import backend as K 
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import numpy as np 


21 import matplotlib.pyplot as plt 

22 import os 

23 

24 def rgb2gray(rgb): 

25 """Convert from color image (RGB) to grayscale. 
26 nnn 

27 return np.dot(rgb[...,:3], [0.299, 0.587, 0.114]) 
28 

29 

30 # load the CIFAR10 data 

31 (x_train, _), (x_test, _) = cifar10.load_data() 

32 

33 # input image dimensions 

34 # we assume data format "channels_last" 

35 img_rows = x_train.shape[1] 

36 img_cols = x_train. shape [2] 

37 channels = x_train.shape [3] 

38 

39 # create saved_images folder 

40 imgs_dir = ’saved_images’ 


41 save_dir = os.path.join(os.getcwd(), imgs_dir) 
42 if not os.path.isdir(save_dir): 
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# display the 1st 100 input images 


imgs 
imgs 
imgs 


plt.savefig(’%s/test_color.png’ 


os.makedirs(save_dir) 


x_test[:100] 
imgs.reshape((10, 10, img_rows, 


(color and gray) 


img_cols, channels) ) 


np.vstack([np.hstack(i) for i in imgs]) 
plt.figure() 

plt.axis(’off’) 

plt.title(’Test color images (Ground 
plt.imshow(imgs, interpolation=’none’) 


plt.show() 


# convert color train and test images 
x_train_gray = rgb2gray(x_train) 
x_test_gray = rgb2gray(x_test) 


Truth) ’) 


% imgs_dir) 


to gray 


# display grayscale version of test images 


imgs 
imgs 
imgs 


x_test_gray[:100] 
imgs.reshape((10, 10, img_rows, 


img_cols)) 


np.vstack([np.hstack(i) for i in imgs]) 
plt.figure() 

plt.axis(’off’) 

plt.title(’Test gray images (Input)’) 
plt.imshow(imgs, interpolation=’none’, 
plt.savefig(’%s/test_gray.png’ % imgs_dir) 
plt.show() 


cmap=’ gray’) 


# normalize output train and test color images 


x_train 


x_test 


= x_train.astype(’float32’) / 


255 


= x_test.astype(’float32’) / 255 


# normalize input train and test grayscale images 
x_train_gray = x_train_gray.astype(’float32’) / 255 
x_test_gray = x_test_gray.astype(’float32’) / 255 


# reshape 
x_train 


x_test 


# reshape 
x_train_gray = x_train_gray.reshape(x_ 
x_test_gray = x_test_gray.reshape(x_test_gray.shape[0], 


images to row x col x channel for CNN output/validation 
= x_train.reshape(x_train. shape [0], 


= x_test.reshape(x_test.shape[0], img_rows, 


# network parameters 


input_shape = (img_rows, img_cols, 1) 
batch_size = 32 
kernel_size = 3 
latent_dim = 256 


# encoder/decoder number of CNN layers and filters per layer 


layer_filters = [64, 128, 256] 


# build the autoencoder model 
# first build the encoder model 


input 


Ss 


= Input (shape=input_shape, name=’encoder_input’) 


x = inputs 
# stack of Conv2D (64) -Conv2D (128) -Conv2D (256) 
for filters in layer_filters: 


x 


# shape info needed to build decoder model so we don’t do hand computation 
# the input to the decoder’s first Conv2DTranspose will have this shape 


# shape is (4, 4, 256) which is processed by the decoder back to 


shape 


Conv2D(filters=filters, 
kernel_size=kernel_size 
strides=2, 
activation=’relu’, 
padding=’ same’) (x) 


K.int_shape (x) 


# generate a latent vector 


images to row x col x channel for CNN input 
train_gray.shape[0], 
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img_rows, 


img_cols, channels) 


img_cols, 
img_cols, 


151 
152 
153 
154 
155 
156 
157 
158 
159 
160 
161 
162 
163 
164 
165 
166 
167 
168 
169 
170 
171 
172 
173 
174 
175 
176 
177 
178 
179 
180 
181 
182 
183 
184 


x = Flatten() (x) 


latent = Dense(latent_dim, 


# instantiate encoder model 


encoder = Model(inputs, 
encoder. summary () 


latent, 


# build the decoder model 


latent_inputs = Input(shape=(latent_dim,), 


name=’ encoder’) 


name=’latent_vector’) (x) 


name=’decoder_input’) 


x = Dense(shape[1]*shape[2]*shape[3]) (latent_inputs) 


x 


Reshape((shape[1], shape[2], 


shape [3])) (x) 


# stack of Conv2DTranspose (256) -Conv2DTranspose (128) -Conv2DTranspose (64) 
for filters in layer_filters[::-1]: 
x = Conv2DTranspose(filters=filters, 
kernel_size=kernel_size, 
strides=2, 
activation=’relu’, 
padding=’ same’) (x) 


outputs = Conv2DTranspose(filters=channels, 
kernel_size=kernel_size, 
activation=’sigmoid’, 
padding=’same’, 

name=’decoder_output ’) (x) 


# instantiate decoder model 
decoder = Model(latent_inputs, outputs, 


decoder.summary () 


# autoencoder = encoder + decoder 
# instantiate autoencoder model 
autoencoder = Model(inputs, 


autoencoder. summary () 


name=’ decoder’) 


decoder(encoder(inputs)), name=’autoencoder’) 


# prepare model saving directory. 


save_dir = os.path.join(os.getcwd(), 


model_name = ’colorized_ae_model.f{epoch:03d}.h5’ 
if not os.path.isdir(save_dir): 
os.makedirs (save_dir) 


filepath = os.path.join(save_dir, 


model_name) 


>saved_models’) 


# save weights for future use (e.g. reload parameters w/o training) 
checkpoint = ModelCheckpoint (filepath=filepath, 
monitor=’val_loss’, 


# Mean Square Error (MSE) loss function, 
autoencoder.compile(loss=’mse’, 


# called every epoch 
callbacks = [checkpoint] 


# train the autoencoder 


verbose=1, 


save_best_only=True) 


autoencoder.fit(x_train_gray, 


x_train, 


validation_data=(x_test_gray, 


epochs=30, 
batch_size=batch_size, 
callbacks=callbacks) 


# predict the autoencoder output from test data 
x_decoded = autoencoder.predict (x_test_gray) 


# display the ist 100 colorized images 


imgs = x_decoded[:100] 
imgs = imgs.reshape((10, 


10, 


img_rows, 


img_cols, 


imgs = np.vstack([np.hstack(i) for i in imgs]) 


plt.figure() 
plt.axis(’off’) 
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Adam optimizer 
optimizer=’ adam’) 


x_test), 


channels) ) 


185 plt.title(’Colorized test images (Predicted)’) 
186 plt.imshow(imgs, interpolation=’none’) 

187 plt.savefig(’%s/colorized.png’ % imgs_dir) 

188 plt.show() 


5.1.3 Linear and Convolutional Latent Space Implementations 


#!/usr/bin/enu python 
# coding: utf-8 


# In[1]: 


import time 

import torch 

import torch.nn as nn 

10 import torch.nn.functional as F 

11 import matplotlib.pyplot as plt 

12 import numpy as np 

13 import random 

14 from torchsummary import summary 


ANoOnBRWNH 


Ke) 


17 # In[2]: 


20 # Information about device 

21 print (torch.cuda.device_count () ) 

22 print (torch.cuda.current_device()) 
23 print (torch. cuda. device (0) ) 

24 print (torch.cuda.get_device_name (0) ) 


26 use_cuda = torch.cuda.is_available() 

27 print (use_cuda) 

28 # Set proper device based on cuda availability 

29 device = torch.device("cuda" if use_cuda else "cpu") 
30 print("Torch device selected: ", device) 


33 # In[3]: 


36 # Construct Autoencoder 
37 class Neti(nn.Module): 


38 def __init__(self): 
39 super(Neti, self).__init__@ 
40 


# Define layers of the autoencoder neural network 


41 
42 
43 # Encoder 

44 self.conv1 = nn.Conv2d(1, 64, 3, padding=1) 

45 self.conv2 = nn.Conv2d(64, 128, 3, padding=1) 
46 self.maxpooli = nn.MaxPool2d (2) 

AT self.conv3 = nn.Conv2d(128, 256, 3, padding=1) 
48 self.maxpool2 = nn.MaxPool2d (2) 

49 

50 


# Code (Latent Representation) 


51 self.fc1 = nn.Linear (256*8*8, 256) 

52 self.fc2 = nn.Linear(256, 256*8*8) 

53 

54 # Decoder 

55 self.upsample1 = nn.Upsample(scale_factor=2, mode=’bilinear’) 
56 self.deconvi = nn.ConvTranspose2d(256, 128, 3, padding=1) 

57 self.upsample2 = nn.Upsample(scale_factor=2, mode=’bilinear’) 
58 self.deconv2 = nn.ConvTranspose2d(128, 64, 3, padding=1) 

59 self.deconv3 = nn.ConvTranspose2d(64, 3, 3, padding=1) 

60 

61 def forward(self, X): 

62 

63 # Encoder 

64 output = F.relu(self.convi(X)) 
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output 
output 


# Latent 
output 
output 
output 
output 


Repres 


torch. 


# Decoder 


output 
output 
output 


return 


output 


# Construct Autoencoder 
class Net2(nn.Module): 


def 


def 


# In[4]: 


super (Net2, sel 
# Define layers 
# Encoder 
self.convi = nn 
self.conv2 = nn 
self.maxpooli = 
self.conv3 = nn. 
# Code (Latent 
self.conv4 = nn 


init__(self): 


# Decoder 


self. 
self. 
self. 
self. 


forward(self, 


deconvi 
upsamplel 
deconv2 
deconv3 


X 


# Encoder 


output 
output 
output 


F.relu 


entation 


reshape (output , 


torch.flatten(output ,1) # Flatten 
F.relu(self.fc1(output)) 
F.relu(self.fc2(output)) 
(-1, 


256, 8, 8) 


F.relu(self.maxpooli(self.conv2 (output) )) 
F.relu(self .maxpool2(self.conv3 (output) )) 


) # Reshape 


F.relu(self.deconvi(self.upsamplel (output))) 
F.relu(self .deconv2(self.upsample2 (output) )) 
F.sigmoid(self.deconv3 (output) ) 


of the autoencoder neural network 


f).__init__© 
.Conv2d(1, 64, 3, 
.Conv2d(64, 128, 


nn.MaxPool2d (2) 
Conv2d (128, 256, 


Representation) 
.Conv2d (256, 256, 


nn.ConvTranspose2d (256, 
nn.Upsample(scale_factor=2, 


nn. ConvTranspose2d (128, 
nn.ConvTranspose2d (64, 


Ja 


(self.conv1(X)) 


# Latent Representation 


output 


self.c 


# Decoder 


output 
output 
output 


return 


output 


import torchvision 
import torchvision.transforms as transforms 
import torch.optim as optim 


from torchvision.transforms.functional 


# In[5]: 


# Transforms 
train_transform 


trans 


onv4 (output ) 


forms .Compose( 


[transforms.ToTensor(), 


padding=1) 


3, 


3, 


3, 


padding=1) 


padding=1) 


padding=1) 


128, 3, 


64, 
3, 3, 


3, 


F.relu(self.deconvi (output) ) 
F.relu(self.deconv2(self.upsamplel (output) )) 
F.sigmoid(self.deconv3 (output) ) 


15 


padding=1) 


mode=’bilinear’) 


padding=1) 


padding=1) 


F.relu(self.maxpooli(self.conv2 (output) )) 
F.relu(self.conv3 (output) ) 


import rgb_to_grayscale 


200 


transforms.RandomRotation(15)] 


) 


test_transform = transforms.Compose( 
[transforms.ToTensor ()] 


) 


# Load in CIFAR10 data, 


train_data = 


test_dat 


ay = 


splitting training and test data and applying transforms 


torchvision.datasets.CIFAR10("./", train= True, transform = 
train_transform, download = True) 
torchvision.datasets.CIFAR10("./", train= False, transform = 
test_transform, download = True) 


# Create dataloaders for loading in data in batches of size mini_batch_size 
mini_batch_size = 100 
= torch.utils.data.DataLoader(train_data, batch_size=mini_batch_size, 
shuffle=True, num_workers=2) 
= torch.utils.data.DataLoader(test_data, batch_size=mini_batch_size, 
shuffle=False, num_workers=2) 


train_lo 


test_loa 


# In[6]: 


# Define 


loss 


ader 


der 


the training function 

def train(model, num_epochs, optimizer, loss_func): 
# Set model to training mode 
model.train() 


es = 


[] 


for i in range(num_epochs): 
total_loss = 0 


Loss 


for 


(images, _) in train_loader: 

images = images.to(device) # Push tensors to GPU 
grayscale_images = rgb_to_grayscale(images) # Convert 
optimizer.zero_grad() # Zero the gradients 

outputs = model(grayscale_images) # Forward pass 

loss = loss_func(outputs, images) # Calculate loss 


loss.backward() # Backpropagation 
optimizer.step() # Update weights 
total_loss += loss.item() # Accumulate loss 


# Save loss for graphing 


loss 


print(f’(Epoch {i+1}) Training Loss: 


sit 


es.append(total_loss / len(train_loader)) 


est (model)}’) 


return losses 


images to grayscale 


{total_loss / len(train_loader)} Test 


# Define function for applying model to test data and returning the accuracy 
def test (model): 
# Set model to evaluation mode 
model.eval() 
total_loss = 0 


# Test on test set 

for (images, _) in test_loader: 

images = images.to(device) # Push tensors to GPU 
grayscale_images = rgb_to_grayscale(images) # Convert images to grayscale 
outputs = model(grayscale_images) # Forward pass 


loss 


= F.mse_loss(outputs, images) # Calculate loss 


total_loss += loss.item() # Accumulate loss 


return total_loss/len(test_loader) 


# In[7]: 


201 # Define hyperparameters 
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learning_rate = 1e-4 
num_epochs = 20 


# In[10]: 


# Create the model with FC latent code 
neti = Net1().to (device) 


# Print a summary of the model WIP 
summary(neti, (1, 32, 32)) 


# Define the loss function and optimizer 
loss_func = nn.MSELoss() 
optimizer = optim.Adam(net1.parameters(), lr=learning_rate) 


# Load model, if saved parameters exist 
try: 

neti.load_state_dict (torch.load("./saved models/net1.pth")) 
except: 

# Perform training 

training_losses = train(neti1, num_epochs, optimizer, loss_func) 


# Save model 
torch.save(neti.state_dict(), ’./saved models/net1i.pth’) 


# Prepare model for evaluation 
neti.eval () 


# In[11]: 


# Plot the losses over epochs for model 1 
plt.plot (training_losses) 
plt.title("Training Loss vs Epochs") 
plt.xlabel("# Epochs") 

plt.ylabel ("Loss") 

plt.show() 


# In[12]: 


# Create the model with CNN latent code 
net2 = Net2().to(device) 


# Print a summary of the model WIP 
summary(net2, (1, 32, 32)) 


# Define the loss function and optimizer 
loss_func = nn.MSELoss() 
optimizer = optim.Adam(net2.parameters(), lr=learning_rate) 


# Load model, if saved parameters exist 
try: 

net2.load_state_dict (torch.load("./saved models/net2.pth")) 
except: 

# Perform training 

training_losses = train(net2, num_epochs, optimizer, loss_func) 


# Save model 
torch.save(net2.state_dict(), ’./saved models/net2.pth’) 


# Prepare model for evaluation 


net2.eval() 


# In[13]: 
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273 

274 # Plot the losses over epochs for model 2 

275 plt.plot (training_losses) 

276 plt.title("Training Loss vs Epochs") 

277 plt.xlabel("# Epochs") 

278 plt.ylabel ("Loss") 

279 plt.show() 

280 

281 

282 # In[ ]: 

283 

284 

285 # Grab images for testing 

286 test_images, _ = next (iter (test_loader) ) 

287 

288 

289 # In[ ]: 

290 

291 

292 ### Test Neti ### 

293 

294 # Display images to be tested 

295 display_test = np.array([transforms.ToPILImage() (img) for img in test_images]) 
296 display_test = display_test.reshape((10, 10, 32, 32, 3)) 
297 display_test = np.vstack([np.hstack(i) for i in display_test]) 
298 plt.figure () 

299 plt.axis(’off’) 

300 plt.title(’Test Images (RGB)’) 

301 plt.imshow(display_test, interpolation=’none’ ) 

302 plt.show() 


303 

304 # Convert RGB test images into grayscale 

305 grayscale_images = rgb_to_grayscale(test_images) .to(device) 
306 


307 # Display images in grayscale 

308 display_test = np.array([transforms.ToPILImage() (img) for img in grayscale_images]) 
309 display_test = display_test.reshape((10, 10, 32, 32, 1)) 

310 display_test = np.vstack([np.hstack(i) for i in display_test]) 
311 plt.figure () 

312 plt.axis(’off’) 

313 plt.title(’Test Images (Grayscale)’) 

314 plt.imshow(display_test, interpolation=’none’, cmap=’gray’) 
315 plt.show() 

316 

317 # Run model on grayscale images 

318 output = neti(grayscale_images) 

319 

320 # Display images after decoding 

321 display_test = np.array([transforms.ToPILImage() (img) for img in output]) 
322 display_test = display_test.reshape((10, 10, 32, 32, 3)) 

323 display_test = np.vstack([np.hstack(i) for i in display_test]) 
324 plt.figure () 

325 plt.axis(’off’) 

326 plt.title(’Test Images (FC Code Autoencoder) ’) 

327 plt.imshow(display_test, interpolation=’none’ ) 

328 plt.show() 

329 

330 

331 # In[ ]: 

332 

333 

334 ### Test Net2 ### 

335 

336 # Display images to be tested 

337 display_test = np.array([transforms.ToPILImage() (img) for img in test_images]) 
338 display_test = display_test.reshape((10, 10, 32, 32, 3)) 

339 display_test = np.vstack([np.hstack(i) for i in display_test]) 
340 plt.figure () 

341 plt.axis(’off’) 

342 plt.title(’Test Images (RGB)’) 

343 plt.imshow(display_test, interpolation=’none’ ) 
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344 
345 
346 
347 
348 
349 
350 
351 
352 
353 
354 
355 
356 
357 
358 
359 
360 
361 
362 
363 
364 
365 
366 
367 
368 
369 
370 


plt. 


show () 


# Convert RGB test images into grayscale 
grayscale_images = rgb_to_grayscale(test_images) .to(device) 


# Display images in grayscale 


display_test 
display_test 
display_test 
plt. 
plt. 
plt. 
plt. 
plt. 


figure () 


axis(’off’) 


np.array ([transforms.ToPILImage() (img) for img in grayscale_images]) 
display_test.reshape((10, 10, 32, 32, 1)) 
np.vstack([np.hstack(i) for i in display_test]) 


title(’Test Images (Grayscale) ’) 
imshow(display_test, interpolation=’none’, cmap=’gray’) 


show () 


# Run model on grayscale images 
output = net2(grayscale_images) 


# Display images after decoding 


display_test 
display_test 
display_test 
plt. 
plt. 
plt. 
plt. 
plt. 


figure () 


axis(’off’) 


np.array([transforms.ToPILImage() (img) for img in output]) 
display_test.reshape((10, 10, 32, 32, 3)) 
np.vstack([np.hstack(i) for i in display_test]) 


title(’Test Images (Fully CNN Autoencoder)’) 
imshow(display_test, interpolation=’none’ ) 


show () 


5.2 My Contribution 


My contribution was primarily the typical three convolutional layer autoencoder implementation. 
I researched various possible implementations for autoencoders, knowing that’s the direction we were 


going to take as a team, and noticed that many examples utilized three convolutional layers. 


such, I deemed it sufficient as something of a baseline implementation. Following the model of various 
implementations discovered through my research, I included a novel way to nudge the loss along and 


improve the efficiency of my code by reducing the learning rate slightly if the loss plateaus. 


In addition to the standard three convolutional layer autoencoder, I was also the one responsible 
for sections 1, 2, and 3.1, the Introduction, Method, and Dataset sections respectively. Additionally, 
I contributed to sections 3.3, 3.4, 4, and 5.1—Results, Analysis and Discussions, Conclusion, and 
Code-just as the others did. Specifically, I contributed the code for 5.1.1, Three Convolutional Layer 


Implementation With Learning Rate Modification. 
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